WHAT THE FACE REVEALS 


Basic and Applied Studies of 
Spontaneous Expression Using the 
Facial Action Coding System (FACS) 


Edited by | 5 
PAUL EKMAN & 
ERIKA L, ROSENBERG 


Facial Expression in : 
Affective Disorders 


New York Oxford» Oxford University Press 1997 


PAUL EKMAN, DAVID MATSUMOTO & WALLACE V. FRIESEN 


Recent progress in the study of facial expressions of emotion in normal indi- 
viduals could have relevance to clinical investigations of depression. Universal facial 
expressions of emotion have been identified (Ekman, 1982a: Izard. 1971: Fridlund, 
Ekman, & Oster, 1987; Ekman & Friesen. 1986) for seven emotions—anger. contempt, 
disgust, fear, sadness, surprise, and happiness—that can be objectively and reliably dis- 
tinguished one from another (Ekman, 1982b). Whether a disturbance in emotion is seen 
as a symptom (Beck, 1967; Abramson, Garber, Edwards, & Seligman, 1978) or as cen- 
tral in the etiology (Izard, 1977; Tomkins. 1963) of depression. the precise measure- 
ment of facial expressions of emotion could be useful in clinical investigations and per- 
haps also in the treatment of depression and other affective disorders. Specifying which 
of the seven emotions are evident in facial expressions, their relative strength, and any 
repetitive sequences of these emotional expressions might help to refine diagnosis, 
could be of aid in monitoring response to treatment, and might help to predict the like- 
lihood of subsequent improvement or relapse. 

The recent spate of studies using electromyographic (EMG) techniques to measure 
facial activity in depressed patients has shown such a potential (Carney. Hang, O'Con- 
nell, & Amado, 1981; Schwartz et al.. 1976a, 1976b, 1978: Teasdale & Bancroft. 1977; 
Teasdale & Rezin, 1978) but there are two inescapable problems in using EMG to mea- 
sure facial expressions. The first problem is inherent in the attachment of the EMG 
leads to the surface of the face. Measurement by this means is not only obtrusive but 
also may inhibit facial expressivity. Second, because many of the muscles active in di- 
verse emotional states lay one on top of the other or are very close to each other, it is 
not possible to measure with EMG the occurrence of all seven emotions signaled by the 
face, any of which may be implicated in depression (Ekman, 1982b: Fridlund & Izard. 
1983). 

An alternative approach to measuring facial expressions of emotion is through sys- 
tematically examining videotaped or filmed records to identify the muscular move- 
ments that constitute the emotional expressions. This approach is totally unobtrusive; 
the camera even may be hidden. The Facial Action Coding System (FACS) (Ekman & 
Friesen, 1976, 1978) is the only comprehensive, objective measurement technique 
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using this approach. Measuring facial movement in muscular terms, FACS identifies all 
seven emotions, the relative strength of each, repetitive sequences, and whether the ex- 
Pression is voluntary or involuntary. We also used an abbreviated, less costly version of 
FACS, the Emotion Facial Action Coding System (EMFACS) (Friesen & Ekman, 
1986), which focuses just on muscular movements relevant to emotion. 

In this first attempt to evaluate the potential of these measures of visible facial 
behavior in studies of affective disorders, we did not gather new interview materials. 
Instead, to reduce the expense of this exploratory study, we have analyzed archival in- 
terview records and a more recent but small sample of interviews. Although method- 
ological limits in these samples constrain the conclusions that can be drawn, the analy- 
sis of these two samples illustrates some of the questions that can be asked by precisely 
measuring facial expressions: Do facial behaviors vary with diagnosis? Are such differ- 
ences apparent only between depressives and schizophrenics, or do the emotional ex- 
Pressions distinguish major from minor depression? Are there sufficient differences 
among patients with the same diagnosis to suggest the possibility of using such mea- 
sures to subclassify or refine diagnoses? Do the facial expressions predict the extent of 
subsequent clinical improvement, and would predictions based on such measures add 
information not ordinarily derived from standard clinical ratings of patient behavior? 


Methods 
Patients 


Langley Porter Sample 


In 1964, two of us (Ekman and Friesen) filmed all of the patients admitted with depres- 
sive symptoms during a six-month period to two wards of the Langley Porter Institute. 
A short standardized interview was conducted at the time of admission and again near 
the time of discharge. Seventeen patients had a final diagnosis of depression (8 neu- 
rotic, 9 psychotic), based on a consensus of the attending physician and ward head. We 
had three clinical psychologists independently evaluate these patients on the Brief Psy- 
chiatric Rating Scale (BPRS) (Overall & Gorham, 1962) after viewing the first part of 
the sound film admission interview, and then again after viewing the discharge inter- 
view. They also evaluated the extent of general improvement in psychiatric status on a 
7-point scale on which 1 indicated regression, 4 no improvement, and 7 maximal im- 


provement. We used the means of their ratings on the BPRS and on improvement in the 
data analyses, 


George Washington Sample 


As part of the NIMH Clinical Research Branch Collaborative Program on the Psy- 
chobiology of Depression (Katz et al., 1984), we were furnished interviews with 12 pa- 
tients to study the feasibility of using our measures of facial expression in studies of af- 
fective disorders. The interviews were conducted when the patients were admitted to 
George Washington University Hospital and then again shortly before discharge. Their 
diagnoses were: 4 major depressive, 3 minor depressive, 2 bipolar disorder manic, and 


2 schizophrenic. We selected two portions of the hour-long interviews for measure- 


Facial Expression in Affective Disorders 333 


ment: the first eight questions, which largely dealt with how the patient currently felt: 
and the last five questions, which focused on diverse matters such as hobbies, whether 
the patient enjoyed life, and feelings about the future. The scores from both portions 
were combined in the data analyses. Because these patients did not change from admis- 
sion to discharge—the mean difference in ratings of their pathology was less than half a 
point on a 7-point scale—we could not use this sample in our analysis of whether facial 
expressions at admission predict improvement. 


Scoring Visible Facial Movement 


The two techniques we used—FACS and EMFACS—are both anatomically based ob- 
jective techniques for measuring facial movement observed on videotape or film. Using 
either technique, a coder “dissects” an observed expression, decomposing it into the 


. Specific facial muscles that produced the movement. The scores for a facial expression 


consist of the list of muscular actions that produced it. In FACS (but not in EMFACS), 
the precise duration of each action also is determined, and the intensity of each muscu- 
lar action and any bilateral asymmetry is rated. EMFACS is more economical to use 
than FACS, not only because it disregards these aspects of facial movement but also 
because EMFACS scores only some, not all, observed facial activity. Using EMFACS, 
a coder decomposes an expression only when one of 33 predefined combinations of 
muscular actions is observed. These 33 predefined combinations include all of the fa- 
cial configurations that have been established empirically (Ekman & Friesen, 1975, 
1978) to signal the seven emotions that have universal expressions. 

The scoring units in both FACS and EMFACS are descriptive, involving no infer- 
ences about emotions. For example, the scores for an expression would be that the 
inner corners of the eyebrows are pulled up and together by the combined action of 
medial frontalis and corrugator muscles, not that the eyebrows are in a sad position. 
While data analyses can be done on these purely descriptive muscular scores, FACS or 
EMFACS scores can also be converted by a computer-stored dictionary into emotion 
scores. 

Coders spend approximately 100 hours learning FACS. Self-instructional materi- 
als teach the anatomy of facial activity, or how muscles singly and in combination 
change the appearance of the face. Prior to using FACS all learners are required to 
Score a videotaped test (provided by Ekman and Friesen), to ensure they are measuring 
facial behavior in agreement with prior learners. To date more than 300 people have 
achieved high inter-coder agreement on this test in their measurement of facial behav- 
ior. Once a coder has learned FACS it takes only a few hours to become familiar with 
the procedures for using EMFACS. 


Felt and Unfelt Happy Expressions 


In addition to providing scores on the incidence of anger, disgust, fear, surprise, sad- 
ness, and contempt, FACS or EMFACS scores can be interpreted in a way that pur- 
portedly distinguishes whether an expression was made involuntarily and therefore is 
presumably a sign that the emotion displayed was felt, or was made voluntarily, and 
was not a sign of felt emotion. The idea that voluntary facial expressions could be dis- 
tinguished from involuntary expressions is consistent with what is known about the 
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neural control of facial expression. Voluntary actions are controlled by corticobulbar 
pathways that emanate from the precentral motor cortex, while extrapyramidal influ- 
ences are thought to control involuntary emotional facial movements. This “dual” 
control of facial expression is shown by the differential consequences of lesions in 
these two areas, which can impair voluntary but leave involuntary expressions in- 
tact, or impair involuntary expressions but leave voluntary expressions intact (Rinn, 
1984). 

Ekman and Friesen have proposed a number of ways to distinguish voluntary 
from involuntary expressions. We will consider only their hypotheses about how to 
make this distinction in regards to smiles, since the evidence to support this distinction 
for other emotional expressions is still quite sparse. 

1. Asymmetry. When happiness is not felt but a smile is made voluntarily—to be 
polite, show agreement, encourage or mislead another—the action of the zygomatic 
major muscle that pulls the lip corners upwards will be stronger on one side of the face 
(Ekman, 1980). The smile will not be totally absent on the other side of the face, just 
weaker, often only slightly weaker. With right-handed persons, the unfelt smile is 
Stronger on the left side of the face (Ekman, Hager, & Friesen, 1981; Hager & Ekman, 
1985). 

2. Duration. Inspection of thousands of spontaneous expressions revealed that 
nearly all of them last between .5ms and 4 seconds. On this basis unfelt expressions 
were hypothesized to be either less than .5ms or longer than 4 seconds (Ekman & 
Friesen, 1982), 

3. Muscles. Duchenne (1862/1990) suggested that felt happy expressions include 
not only the activity of the zygomatic major muscle but the action of the orbicularis 
oculi muscle, which pulls the skin around the eyes inwards toward the eyeball. When 
happiness is not felt but a smile is assumed voluntarily, the zygomatic major would be 
active but not the orbicularis oculi. 

A number of lines of evidence support this distinction between felt and unfelt 
smiles. Felt smiles occurred more often when subjects watched pleasant nature films, 
while unfelt smiles were more frequent when the subjects watched unpleasant surgi- 
cal scenes (Ekman, Friesen, & Ancoli, 1980; Ekman, Davidson, & Friesen, 1990). 
Felt smiles occurred more often in response to a joke; unfelt smiles when the same 
subjects were asked to smile deliberately (Ekman et al., 1981; Hager & Ekman, 
1985). There was evidence of differences in the pattern of hemispheric activation 
asymmetry during the expression of felt versus unfelt smiles (Ekman, Davidson & 
Friesen, 1990). Ten-month-old infants showed a greater number of felt smiles in re- 
sponse to the mother’s approach as compared to a stranger’s approach and a greater 


number of unfelt smiles in response to the stranger as compared to the mother (Fox & 
Davidson, 1987). 


Scoring Procedures 


The coders were not told the diagnosis of the patient, nor whether it was an admission 
or discharge interview. Table 15.1 lists how the scoring was done. Intercoder reliability 
was evaluated by computing the correlation (Spearman rank order) between the fre- 
quencies with which two coders scored the specific facial actions. The correlation be- 
tween the coder using FACS was .87, and between the coders using EMFACS was .99: 
both correlations were significant beyond the .01 level of confidence. 
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TABLE 15.1. How the facial expressions were scored in the two samples 
Samples 
George Washington Langley Porter 
N=12 N=17 

Scoring procedure EMFACS EMFACS 
FACS 

How felt-unfelt smiles Orbicularis Orbicularis 

were distinguished Oculi & Oculi 

Duration 

Number of coders 2 2 

Results 


Can Facial Behavior at Admission Predict Clinical Improvement? 


We begin with this question because it is the only one for which there was sufficient data to 
compute any statistics. Only the Langley Porter data were considered, as there was virtu- 
ally no improvement in the George Washington sample. Although EMFACS measures fa- 
cial behaviors relevant to a range of emotions, only behaviors relevant to sadness, con- 
tempt, felt happiness, and unfelt happiness occurred often enough in the interviews to be 
analyzed. Because of the small samples of neurotics and psychotics, and because this diag- 
nostic distinction is not preserved in DSM II, diagnosis was disregarded in the analysis. 
The first step was to calculate Pearson product-moment correlations between the EM- 
FACS scores from the admission interview and the clinicians’ ratings of improvement by 
the time of discharge. As a precaution against inflated correlations owing to the undue in- 
fluence of a few extreme scores, Spearman rank-order correlations and Kendall nonpara- 
metric coefficients were also calculated. The confidence levels reported for the Pearson 
correlations reported below were also obtained with the two nonparametric correlations. 
Table 15.2 shows that both contempt and unfelt happiness during the admission 


TABLE 15.2. Predicting clinical improvement 


Correlation with Improvement 


at time of Discharge 
Admission Measure Correlation p 
Contempt expression Pearson r= -,538 05 
Unfelt happiness expression Pearson r = -.600 O1 
BPRS pathology rating Pearson r = .458 06 
Contempt + 
Unfelt happy Multiple R = .794 .001 
Contempt + 
Unfelt happy + 
BPRS pathology Multiple R = .818 .001 


Note: Correlations are between measures at the time of admission to the hos- 
pital and clinical improvement rated at the time of discharge. 
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interview were negatively correlated with the extent of clinical improvement by the 
time of the discharge interview. For comparison, the table also shows that the correla- 
tion between the BPRS total pathology score at time of admission with later improve- 
ment nearly reached significance. The patients who showed more contempt or unfelt 
happiness, or who were rated on the BPRS as being more pathological, improved less 
than other patients. 

Regression analyses were employed to determine whether the contempt and unfelt 
happy scores were accounting for the same or independent variance in predicting clini- 
cal improvement, and whether the admission BPRS predicted a significant amount of 
variance in clinical improvement beyond that accounted for by the facial scores. Table 
15.2 shows that entering both unfelt happy and contempt scores in the regression equa- 
tion produced a large increase in the correlation with improvement. Addition of the ad- 
mission BPRS, however, raised the coefficient from .794 to only .818, an R2 change of 
only 3.9% (F(1,13) = 1.538, ns). Thus admission BPRS could not predict variance in 
general improvement ratings beyond that accounted for by facial scores at admission. 

Entering the BPRS in the equation first and then adding the contempt and unfelt 
happiness scores provided an estimate of how much of the variance independent of the 
BPRS is added by the facial scores. Inclusion of the facial scores accounted for 46% 
more of the variance in improvement ratings than the BPRS alone, and this change was 
statistically significant (F(2, 13) = 36.164, p < .001). Thus facial scores of unfelt hap- 
piness and contempt predicted a significant amount of variance in clinical improvement 
ratings beyond that acccounted for by admission BPRS. We inspected the distribution 
of emotion scores among the neurotic and psychotic subgroups to see if the scores that 
predicted improvement—unfelt happiness and contempt—might have been evident in 
Just one of the two groups. That was not the case. Among neurotics and psychotics, 
some patients showed these two expressions. 

Because age differences and length of interview could have confounded the results, 
correlations were calculated to examine the relationship between these variables and the 


facial scores. There were no significant correlations for either of these two variables, indi- . 


cating that the results we report cannot be due to age or length of interview differences. 
Problems with both the measure of clinical improvement and the BPRS should be 
noted. Both measures were based on clinicians’ judgments after viewing the first por- 
tions of the sound film interviews. Thus, the clinicians making these evaluations were 
exposed to some of same facial expressions (in addition to the speech, voice, and body 
movement) that were measured with EMFACS. Such an overlap might have inflated 
the correlation between EMFACS and the measure of clinical improvement. Two argu- 
ments mitigate, at least in part, this criticism. First, we recomputed the correlations uti- 
lizing only the facial behavior in the last half of the interviews that had not been seen 
by the clinicians who had rated improvement, so there would be independence between 
the behavior measured with EMFACS and the behavior viewed for the global judgment 
of improvement. The correlations remained at the same level of significance. Second, 
the lack of independence was even greater between the BPRS and the improvement 
measures than between either of these and EMFACS, since these two judgments not 
only were based on viewing the same behavior sample but were made by the very same 
clinicians at the very same time. Yet EMFACS was more correlated with improvement 
than was the BPRS. In any case, future studies should employ a more independent 
measure of clinical improvement, based on judgments made in settings different from 
the one in which the facial behavior is measured. The same stricture would apply to the 
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sample of behavior on which the BPRS is based. It would be sensible to include also 
other standard clinical ratings, not just the BPRS. 


Does Facial Behavior Differ Between Diagnostic Groups? 


To examine this question we utilized the George Washington rather than the Langley 
Porter interviews. Even though there were very few patients in this sample, the diag- 
noses were current and there were four diagnostic groups to compare. The correlations 
among the frequency of an emotional expression, its duration, and mean duration were 
above .90. We report in figure 15.1 the mean duration for six emotions plus unfelt hap- 
piness. (Contempt is not included as a separate category because we did not distinguish 
it from disgust, as we do now, when this scoring was done.) 

Figure 15.1 shows that the major depressives showed more sadness and disgust 
and less unfelt happiness than minor depressives. The manics showed more felt happi- 
ness, unfelt happiness, and less anger, disgust, or sadness than either depression group. 
The schizophrenics differed from the manics and the depressives in showing more fear 
and less of all the other emotions. 


Does Facial Behavior Differ Within Diagnostic Groups? 


Figure 15.2 shows the mean duration scores for the four major depressive patients. Pa- 
tient 38 stands out from the others in showing no sadness, while Patient 10 differs from 
the others in showing much more unfelt happiness and disgust. Figure 15.3 shows the 
mean duration scores for the three minor depressives. Again there appear to be individual 
differences in emotional expressions among patients with the same diagnosis. While simi- 
lar in sadness, the patients differ in regard to fear and in regard to the happy expressions. 
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Figure 15.) Mean duration of affect for psychiatric groups. 
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Figure 15.2 Mean duration of affect for major depressives. 
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Figure 15.3 Mean duration of affect for minor depressives. 
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The Relationship Between FACS and EMFACS Scoring 


Most clinical investigators would choose to use EMFACS rather than FACS simply be- 
cause it is less costly, if the two measures produce similar results. FACS scoring, on the 
level of detail employed here, takes approximately one hour of scoring for each minute 
scored (a ratio of 60:1, scoring-to-interview time). In the George Washington sample, 
20 minutes of facial behavior was scored from each of 12 patients, requiring 240 hours 
for FACS scoring. In EMFACS scoring, the ratio of scoring to interview time is 6 to 1. 
Scoring the George Washington sample with EMFACS required 24 hours. 

We examined the correlation between FACS and EMFACS scoring to evaluate the 
similarity in the results obtained with these two measures. Different persons scored the 
George Washington interview sample with either FACS and with EMFACS. Rank- 
order correlations were computed for each scoring category in which at least ten in- 
stances of that type of behavior was scored. The mean Rho across the ten scoring cate- 
gories that met this requirement was .656. For nine of the scoring categories, the 
correlation coefficient was significant beyond the 1% level of confidence. The one scor- 
ing category in which the correlation between EMFACS and FACS scores was not sig- 
nificant—the action of risorious muscle—had been the most infrequent, just meeting 
our requirement of at least ten occurrences, and had not yielded any differences be- 
tween diagnostic groups. 

Examining the number of times a type of facial behavior was scored by FACS and 
by EMFACS revealed that EMFACS is less sensitive than FACS. EMFACS achieves 
speed in scoring by not allowing the coder to inspect an expression more than three 
times, and by prohibiting slowed motion inspection. But this speed is gained at the cost 
of missing some instances of the behavior to be scored. Whenever it is important to 
know the absolute number of times a patient showed anger, or fear or sadness, and so 
forth, or exactly where in an interview each expression was shown, FACS will remain 
the preferred method. In many clinical studies in which it is important only to make 
relative differentiations, as in the data reported earlier in table 15.2, EMFACS will be 
the method of choice because it is less costly. (Ekman & Fridlund, 1987, have more 
thoroughly contrasted the advantages of each method in studies of affective disorder, 


and presented other arguments as to why FACS should be used in exploratory studies 
of affective disorders.) 


Discussion 


Despite the limits in the samples studied, the results should encourage clinical investi- 
gators to consider measuring facial expressions. At the very least, the results argue for 
repeating these studies with larger samples and better measures of clinical improve- 
ment and psychopathology. If the results were to replicate, measures of facial expres- 
sion could have a variety of uses. 

Facial expressions of emotion could be of use in refining diagnosis and in increas- 
ing the reliability of diagnostic classification. It would be important, for example, in 
new studies to determine whether the facial fear scores could help identify those pa- 
tients in whom anxiety is evident in an otherwise depressed picture. Figures 15.2 and 
15.3 suggest that facial measures may be of use also in distinguishing subgroups 
among patients who share the same diagnosis. Study could then be made of whether 
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those subgroups differ in etiology, length of illness, or response to one or another treat- 
ment. 

The most exciting finding—that facial expression measures predicted later im- 
provement (table 15.2)—must be regarded with caution. The sample size was small (17 
patients) and there were, as discussed earlier, problems with both the improvement 
measure and the comparison with only one standard psychiatric rating scale. Caution, 
however, does not require ignoring the encouraging nature of what was found. Facial 
measures during the acute phase of the disorder did predict the extent of subsequent 
improvement, and the facial measures were more powerful than the Brief Psychiatric 
Rating Scale in predicting improvement, accounting for variance not accounted for by 
the BPRS. In addition to replicating these findings, further research might investigate 
whether facial measures of emotional state can predict response to different forms of 
treatment, is of use in monitoring treatment progress not just treatment outcome, the 
onset of unwanted side effects with the neuroleptics, and in predicting relapse. 
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AFTERWORD 


Depression and Expression 


PAUL EKMAN 


The first version of this article was written in 1982 and was rejected for publi- 
cation. We revised it and resubmitted it two more times, and it was again rejected. The 
reviewers complained that the diagnostic categories in sample 1 are no longer relevant, 
and the size of sample 2 was too small. We were sufficiently discouraged that none of 
us again did any research on psychopathology and returned instead to basic research on 
emotion and expression. 

I included this article because I am more convinced than ever that it did obtain im- 
portant findings. The fact that facial measures did better than clinical ratings by expert 
clinicians (on the BPRS) in predicting improvement is of import and deserves to be 
replicated. The findings that unfelt happiness (not felt happiness) correlated with im- 
provement adds to the evidence that this distinction is an important one, a fact borne 
out by many of the articles in this book. The finding on contempt is also a negative pre- 
dictor of improvement and is also consistent with more recent findings on expression 
and psychopathology reported in this book. And the discovery of individual differences 
in expression among patients with the same diagnosis (in sample 2) also suggests the 
possible value of distinguishing subgroups of patients on the basis of emotional expres- 
sions when evaluating treatment effects. (This finding on individual differences in pa- 
tients with the same diagnosis and the finding reported in the next paragraph were ob- 
tained in the middle 1970s.) 

This article also reports the only evidence on the relationship between FACS and 
EMFACS scoring. This is an important methodological issue for anyone who is consid- 
ering facial measurement, since as the article reports, EMFACS requires only one-tenth 
the scoring time of FACS. But as the article reports, this benefit is offset by some costs. 
Some emotional expressions detected by FACS are missed by EMFACS. I will reserve 
for the last chapter of this book a more general discussion of when to use FACS, 
EMFACS, or even simpler scoring or rating techniques. 
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